Basic Plot with no Data

  • Review:
    • figure()
    • output_notebook() or output_file()
    • show()
In [4]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

p = figure(plot_width=300, plot_height=300)
output_notebook()
show(p)
Loading BokehJS ...

Plotting Using NumPy Arrays

  • NumPy is the widely used package for scientific computing in Python
  • Python library that provides a very useful multidimensional array object the ndarray object
  • bokeh can take input in the form of an ndarray object
In [5]:
from bokeh.models import ColumnDataSource
from bokeh.io import output_notebook, show

p = figure(plot_width=300, plot_height=300)

# x and y are numpy arrays
x = np.linspace(0, 10, 101)
y = np.exp(x)

# circle glyph takes the numpy array outputs
p.circle(x, y)

output_notebook()
show(p)

print(type(x))
print(type(y))
Loading BokehJS ...
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>

Using Pandas DataFrames

  • pandas is a Python package for working with realational data
  • Built on top of the NumPy library.
  • pandas DataFrame is primary data structure
    • analgous in appearance to excel workbook or R data frame
In [6]:
from bokeh.plotting import figure
from bokeh.io import output_notebook, show

ghg = building_data['GHG_Emissions']  # x-values
sqft = building_data['Gross_Sq_Ft']  # y-values

# Set up the figure
p = figure(plot_width=500,
           plot_height=300,
           x_axis_label='Greenhouse Gas Emissions',
           y_axis_label='Gross Square Feet')

p.circle(ghg, sqft)

output_notebook()
show(p)

print(type(ghg))
print(type(sqft))
Loading BokehJS ...
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

Bokeh's ColumnDataSource

  • In the background bokeh is transforming these data formats into the main data format for bokeh - ColumnDataSource
  • ColumnDataSource is the main data structure in bokeh.
  • ColumnDataSource has a data attribute that matches a string name to a sequence of data.
    • In the case of the pandas DataFrame the string name is the column name and the sequence of data is the values from the column

Equivalent DataFrame and ColumnDataSource for comparison

In [26]:
table = pd.DataFrame(data=[['Greg', 2, 68], ['Tim', 4, 70]],
                     columns=['name', 'number', 'height'])

print(table)
   name  number  height
0  Greg       2      68
1   Tim       4      70
In [25]:
table = ColumnDataSource(data={
    'name': ['Greg', 'Tim'],
    'number': [2, 4],
    'height': [68, 70],
})

table.data
Out[25]:
{'height': [68, 70], 'name': ['Greg', 'Tim'], 'number': [2, 4]}
  • Benefits of the ColumnDataSource:
    • Can be used to link selections between plots
    • Can be used to create extra hover tooltips

Transform a Pandas DataFrame to a ColumnDataSource

In [29]:
# pass the pandas DataFrame building_data to ColumnDataSource function
building_cds = ColumnDataSource(building_data)
building_cds.data.keys() # the keys are the column headers from the DataFrame
Out[29]:
dict_keys(['Property_Name', 'Address', 'ZIP', 'Tax_Parcel', 'Property_Type', 'Gross_Sq_Ft', 'Property_Uses', 'Site_EUI', 'EnergyStar_Score', 'EnergyStar_Certified', 'Year_Built', 'GHG_Emissions', 'GHG_Intensity', 'Site_Energy_Use', 'Percent_Electr', 'Percent_Gas', 'Percent_Steam', 'Water_Intensity', 'OnSite_Solar', 'Owner_Submitted_Info', 'Owner_Submitted_Link', 'built_before', 'index'])

Plotting with the ColumnDataSource

  • pass the dictionary keys as inputs to the x and y parameters of the circle glyph
  • GHG_Emissions and Gross_Sq_Ft are keys from the ColumnDataSource
  • set the source equal to the ColumnDataSource object building_cds
  • NOTE you are pulling from the ColumnDataSource object and NOT the pandas DataFrame
In [12]:
from bokeh.plotting import figure
from bokeh.io import output_notebook, show

# Set up the figure
p = figure(plot_width=500,
           plot_height=300,
           x_axis_label='Greenhouse Gas Emissions',
           y_axis_label='Gross Square Feet')

p.circle('GHG_Emissions', 'Gross_Sq_Ft', source=building_cds)

output_notebook()
show(p)

print(type(building_cds))
Loading BokehJS ...
<class 'bokeh.models.sources.ColumnDataSource'>

Color Mapping

  • You can color points based on categorical values
  • from bokeh.models import CategoricalColorMapper
    • CategoricalColorMapper inputs:
      • factors
      • palette
  • to the glyph property you have to pass a dictionary
    • field - which is the name of the column to map
    • transform - the color map for that value.
In [34]:
from bokeh.plotting import figure
from bokeh.io import output_notebook, show
from bokeh.models import CategoricalColorMapper

# Set up the figure
p_basic = figure(plot_width=500,
           plot_height=300,
           x_axis_label='Greenhouse Gas Emissions',
           y_axis_label='Gross Square Feet')

# Create the CategoricalColorMapper object
color_mapper = CategoricalColorMapper(factors=['Built after 1950', 'Built before 1950'],
                                      palette=['red', 'blue'])

p_basic.circle(x='GHG_Emissions', 
         y='Gross_Sq_Ft', 
         source=building_cds, 
         color=dict(field='built_before', transform=color_mapper),
         legend='built_before')

output_notebook()
show(p_basic)

print(type(building_cds))
Loading BokehJS ...
<class 'bokeh.models.sources.ColumnDataSource'>

Using an imported color palette

  • bokeh includes a number of useful color palettes for import
  • from bokeh.palettes import Colorblind
  • specify the number of colors you want from the palette
In [38]:
from bokeh.plotting import figure
from bokeh.io import output_notebook, show
from bokeh.models import CategoricalColorMapper
from bokeh.palettes import Colorblind, viridis

# Set up the figure
p_cat = figure(plot_width=500,
           plot_height=300,
           x_axis_label='Greenhouse Gas Emissions',
           y_axis_label='Gross Square Feet')

# create an list of unique values
built_before_list = list(building_cds.data['built_before'].unique())

# Create the CategoricalColorMapper object
color_mapper = CategoricalColorMapper(factors=built_before_list,
                                      palette=Colorblind[3])

p_cat.circle(x='GHG_Emissions',
         y='Gross_Sq_Ft',
         source=building_cds,
         color=dict(field='built_before', transform=color_mapper),
         legend='built_before')

output_notebook()
show(p_cat)

print(type(building_cds))
Loading BokehJS ...
<class 'bokeh.models.sources.ColumnDataSource'>

Color Palette Structure

  • The palettes themselves are just lists of hexadecmial RGB color strings
In [39]:
print(Colorblind[3])
print(viridis(6))
['#0072B2', '#E69F00', '#F0E442']
['#440154', '#404387', '#29788E', '#22A784', '#79D151', '#FDE724']

Larger Palettes

The bokeh.palettes module also has some larger palettes with 256 colors.

  • The large palettes available are shown below:

In [45]:
from bokeh.plotting import figure
from bokeh.io import output_notebook, show
from bokeh.models import CategoricalColorMapper
from bokeh.palettes import viridis, grey, inferno

# Set up the figure
p_scale = figure(plot_width=500,
           plot_height=300,
           x_axis_label='Greenhouse Gas Emissions',
           y_axis_label='Gross Square Feet')

# create an list of unique values
built_before_list = sorted(list(building_cds.data['Year_Built'].unique()))

# Create the CategoricalColorMapper object
color_mapper = CategoricalColorMapper(factors=built_before_list,
                                      palette=viridis(len(built_before_list)))  # used a new color palette

p_scale.circle(x='GHG_Emissions',
         y='Gross_Sq_Ft',
         source=building_cds,
         color=dict(field='Year_Built', transform=color_mapper),
         # legend='Year_Built'
        )

output_notebook()
show(p_scale)

print(type(building_cds))
Loading BokehJS ...
<class 'bokeh.models.sources.ColumnDataSource'>

Bokeh Layouts

  • row method - aligns plots & menu objects in rows
  • column method - aligns plots & menu objects in columns

Row Layout

In [37]:
from bokeh.layouts import row, column

layout = row(p_basic, p_cat, p_scale)

show(layout)

Column Layout

In [38]:
from bokeh.layouts import row, column

layout = column(p_basic, p_cat, p_scale)

show(layout)

Combination Layout

In [39]:
from bokeh.layouts import row, column

layout = column(row(p_basic, p_cat), p_scale)

show(layout)

Creating grid plots

  • Benefit is that you have one toolbar for all the plots
In [46]:
from bokeh.layouts import gridplot

grid_layout = gridplot([[p_basic, p_cat], [p_scale, None]])

output_notebook()
show(grid_layout)
Loading BokehJS ...

Allow for scaling of the plots

  • Use sizing_mode to autoscale
In [41]:
from bokeh.layouts import gridplot

grid_layout = gridplot(
    [[p_basic, p_cat], [p_scale, None]], sizing_mode='scale_width')

output_notebook()
show(grid_layout)
Loading BokehJS ...

Changing the Toolbar Location

  • use toolbar_location to modify
In [42]:
from bokeh.layouts import gridplot

grid_layout = gridplot(
    children=[[p_basic, p_cat], [p_scale, None]],
    sizing_mode='scale_width',
    toolbar_location='left')

output_notebook()
show(grid_layout)
Loading BokehJS ...